Querying XML Data by Parallel Holistic Twig Joins Processing

نویسندگان

  • Imam Machdi
  • Toshiyuki Amagasa
  • Hiroyuki Kitagawa
چکیده

XML has become the de facto standard for data representation and exchange over the Internet. Nevertheless, along with the increasing size of XML documents and complexities to evaluate XML queries, existing query processing performance in a single-centralized environment will deteriorate. Parallelism is, thus, a viable solution. One possible approach of exploiting parallelism is to adopt a PC cluster system, in which tens of commodity PCs are interconnected with a high-speed network, due to its recent popularization and commercialization. The core operation of XML query processing is to find query patterns in XML data. One of the techniques to find query patterns is the holistic twig joins algorithms, which are important family algorithms to enable us to process queries consisting branches holistically. To the best of our knowledge, parallel query processing based on holistic twig joins has not been studied very intensively. At this stage of the project we focus on finding methods of partitioning XML data and distributing them to cluster PCs. The first method, Grid Metadata for XML (GMX), aims at partitioning XML data obtained from heterogeneous XML documents for inter query parallelism. The second method, streams-based partitioning, aims at partitioning XML data on the fly for intra query parallelism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Holistic Twig Joins on Indexed XML Documents

Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for efficient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestordescendant relationships. In this paper, we address the problem of efficient processing of holi...

متن کامل

StreamTX: extracting tuples from streaming XML data

We study the problem of extracting flattened tuple data from streaming, hierarchical XML data. Tuple-extraction queries are essentially XML pattern queries with multiple extraction nodes. Their typical applications include mapping-based XML transformation and integrated (set-based) processing of XML and relational data. Holistic twig joins are known for the optimal matching of XML pattern queri...

متن کامل

TwigStackList-: A Holistic Twig Join Algorithm for Twig Query with Not-Predicates on XML Data

As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML twig queries. However, as far as we know, very little work has examined the efficient processing of XML twig queries with not-predicates. In this paper, we propose a novel holistic twig join algorithm, called Twig...

متن کامل

TwigX-Guide: An Efficient Twig Pattern Matching System Extending DataGuide Indexing and Region Encoding Labeling

With the rapid emergence of XML as an enabler for data exchange and data transfer over the Web, querying XML data has become a major concern. In this paper, we present a hybrid system, TwigX-Guide; an extension of the well-known DataGuide index and region encoding labeling to support twig query processing. With TwigX-Guide, a complex query can be decomposed into a set of path queries, which are...

متن کامل

Recursive Twig Pattern Query

XQuery is a language for querying XML data which is widely used on the Internet. In XQuery, user can define recursive functions for querying and processing XML data. XML twig pattern query is considered as core operation for querying XML data which has been studied intensively in recent years. More powerful recursive queries can be achieved via combining user-defined recursive function and twig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008